{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "c59e5e4e",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/examples/docstore/RedisDocstoreIndexStoreDemo.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "f35495ae",
   "metadata": {},
   "source": [
    "# Redis Docstore+Index Store Demo"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c48b9493",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7ca3c885",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a54d1c43-4b7f-4917-939f-a964f6f3dafc",
   "metadata": {},
   "outputs": [],
   "source": [
    "import nest_asyncio\n",
    "\n",
    "nest_asyncio.apply()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fa67fa07-1395-4aab-a356-72bdb302f6b2",
   "metadata": {},
   "outputs": [],
   "source": [
    "import logging\n",
    "import sys\n",
    "import os\n",
    "\n",
    "logging.basicConfig(stream=sys.stdout, level=logging.INFO)\n",
    "logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1d12d766-3ca8-4012-9da2-248be80bb6ab",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:numexpr.utils:Note: NumExpr detected 16 cores but \"NUMEXPR_MAX_THREADS\" not set, so enforcing safe limit of 8.\n",
      "Note: NumExpr detected 16 cores but \"NUMEXPR_MAX_THREADS\" not set, so enforcing safe limit of 8.\n",
      "INFO:numexpr.utils:NumExpr defaulting to 8 threads.\n",
      "NumExpr defaulting to 8 threads.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/loganm/miniconda3/envs/llama-index/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n"
     ]
    }
   ],
   "source": [
    "from llama_index import (\n",
    "    SimpleDirectoryReader,\n",
    "    ServiceContext,\n",
    "    StorageContext,\n",
    ")\n",
    "from llama_index import VectorStoreIndex, SummaryIndex, SimpleKeywordTableIndex\n",
    "from llama_index.composability import ComposableGraph\n",
    "from llama_index.llms import OpenAI\n",
    "from llama_index.response.notebook_utils import display_response"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d680ec5e",
   "metadata": {},
   "source": [
    "#### Download Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9cf71177",
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p 'data/paul_graham/'\n",
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "f6dd9d5f-a601-4097-894e-fe98a0c35a5b",
   "metadata": {},
   "source": [
    "#### Load Documents"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e7cdaf9d-cfbd-4ced-8d4e-6eef8508224d",
   "metadata": {},
   "outputs": [],
   "source": [
    "reader = SimpleDirectoryReader(\"./data/paul_graham/\")\n",
    "documents = reader.load_data()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "bae82b55-5c9f-432a-9e06-1fccb6f9fc7f",
   "metadata": {},
   "source": [
    "#### Parse into Nodes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f97e558a-c29f-44ec-ab33-1f481da1a6ef",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.node_parser import SentenceSplitter\n",
    "\n",
    "nodes = SentenceSplitter().get_nodes_from_documents(documents)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "aff4c8e1-b2ba-4ea6-a8df-978c2788fedc",
   "metadata": {},
   "source": [
    "#### Add to Docstore"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1514211c",
   "metadata": {},
   "outputs": [],
   "source": [
    "REDIS_HOST = os.getenv(\"REDIS_HOST\", \"127.0.0.1\")\n",
    "REDIS_PORT = os.getenv(\"REDIS_PORT\", 6379)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1ba8b0da-67a8-4653-8cdb-09e39583a2d8",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/loganm/miniconda3/envs/llama-index/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n"
     ]
    }
   ],
   "source": [
    "from llama_index.storage.docstore import RedisDocumentStore\n",
    "from llama_index.storage.index_store import RedisIndexStore"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "60e781d1",
   "metadata": {},
   "outputs": [],
   "source": [
    "storage_context = StorageContext.from_defaults(\n",
    "    docstore=RedisDocumentStore.from_host_and_port(\n",
    "        host=REDIS_HOST, port=REDIS_PORT, namespace=\"llama_index\"\n",
    "    ),\n",
    "    index_store=RedisIndexStore.from_host_and_port(\n",
    "        host=REDIS_HOST, port=REDIS_PORT, namespace=\"llama_index\"\n",
    "    ),\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e0b18789",
   "metadata": {},
   "outputs": [],
   "source": [
    "storage_context.docstore.add_documents(nodes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6c7a6877",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "20"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(storage_context.docstore.docs)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "528149c1-5bde-4eba-b75a-e8fa1da17d7c",
   "metadata": {},
   "source": [
    "#### Define Multiple Indexes\n",
    "\n",
    "Each index uses the same underlying Node."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "316fb6ac-2031-4d17-9999-ffdb827f46d1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens\n",
      "> [build_index_from_nodes] Total LLM token usage: 0 tokens\n",
      "INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 0 tokens\n",
      "> [build_index_from_nodes] Total embedding token usage: 0 tokens\n"
     ]
    }
   ],
   "source": [
    "summary_index = SummaryIndex(nodes, storage_context=storage_context)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9440f405-fa75-4788-bc7c-11d021a0a17b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens\n",
      "> [build_index_from_nodes] Total LLM token usage: 0 tokens\n",
      "INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 17050 tokens\n",
      "> [build_index_from_nodes] Total embedding token usage: 17050 tokens\n"
     ]
    }
   ],
   "source": [
    "vector_index = VectorStoreIndex(nodes, storage_context=storage_context)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "364ef89f-4ba2-4b1a-b5e5-619e0e8420ef",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens\n",
      "> [build_index_from_nodes] Total LLM token usage: 0 tokens\n",
      "INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 0 tokens\n",
      "> [build_index_from_nodes] Total embedding token usage: 0 tokens\n"
     ]
    }
   ],
   "source": [
    "keyword_table_index = SimpleKeywordTableIndex(\n",
    "    nodes, storage_context=storage_context\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5c6b2141-fc77-4dec-891b-d4dad0633b35",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "20"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# NOTE: the docstore still has the same nodes\n",
    "len(storage_context.docstore.docs)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "365a025b",
   "metadata": {},
   "source": [
    "#### Test out saving and loading"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1b359a08",
   "metadata": {},
   "outputs": [],
   "source": [
    "# NOTE: docstore and index_store is persisted in Redis by default\n",
    "# NOTE: here only need to persist simple vector store to disk\n",
    "storage_context.persist(persist_dir=\"./storage\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "84b3d2f4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# note down index IDs\n",
    "list_id = summary_index.index_id\n",
    "vector_id = vector_index.index_id\n",
    "keyword_id = keyword_table_index.index_id"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1593ca1d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:llama_index.indices.loading:Loading indices with ids: ['24e98f9b-9586-4fc6-8341-8dce895e5bcc']\n",
      "Loading indices with ids: ['24e98f9b-9586-4fc6-8341-8dce895e5bcc']\n",
      "INFO:llama_index.indices.loading:Loading indices with ids: ['f7b2aeb3-4dad-4750-8177-78d5ae706284']\n",
      "Loading indices with ids: ['f7b2aeb3-4dad-4750-8177-78d5ae706284']\n",
      "INFO:llama_index.indices.loading:Loading indices with ids: ['9a9198b4-7cb9-4c96-97a7-5f404f43b9cd']\n",
      "Loading indices with ids: ['9a9198b4-7cb9-4c96-97a7-5f404f43b9cd']\n"
     ]
    }
   ],
   "source": [
    "from llama_index.indices.loading import load_index_from_storage\n",
    "\n",
    "# re-create storage context\n",
    "storage_context = StorageContext.from_defaults(\n",
    "    docstore=RedisDocumentStore.from_host_and_port(\n",
    "        host=REDIS_HOST, port=REDIS_PORT, namespace=\"llama_index\"\n",
    "    ),\n",
    "    index_store=RedisIndexStore.from_host_and_port(\n",
    "        host=REDIS_HOST, port=REDIS_PORT, namespace=\"llama_index\"\n",
    "    ),\n",
    ")\n",
    "\n",
    "# load indices\n",
    "summary_index = load_index_from_storage(\n",
    "    storage_context=storage_context, index_id=list_id\n",
    ")\n",
    "vector_index = load_index_from_storage(\n",
    "    storage_context=storage_context, index_id=vector_id\n",
    ")\n",
    "keyword_table_index = load_index_from_storage(\n",
    "    storage_context=storage_context, index_id=keyword_id\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "d3bf6aaf-3375-4212-8323-777969a918f7",
   "metadata": {},
   "source": [
    "#### Test out some Queries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9bba68f3-2743-437e-93b6-ce9ba92e40c3",
   "metadata": {},
   "outputs": [],
   "source": [
    "chatgpt = OpenAI(temperature=0, model=\"gpt-3.5-turbo\")\n",
    "service_context_chatgpt = ServiceContext.from_defaults(\n",
    "    llm=chatgpt, chunk_size=1024\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "544c0565-72a0-434b-98e5-83138ebdaa2b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 26111 tokens\n",
      "> [get_response] Total LLM token usage: 26111 tokens\n",
      "INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens\n",
      "> [get_response] Total embedding token usage: 0 tokens\n"
     ]
    }
   ],
   "source": [
    "query_engine = summary_index.as_query_engine()\n",
    "list_response = query_engine.query(\"What is a summary of this document?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "39d250be",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "**`Final Response:`** This document is a narrative of the author's journey from writing and programming as a young person to pursuing a career in art. It describes his experiences in high school, college, and graduate school, and how he eventually decided to pursue art as a career. He applied to art schools and eventually was accepted to RISD and the Accademia di Belli Arti in Florence. He passed the entrance exam for the Accademia and began studying art there. He then moved to New York and worked freelance while writing a book on Lisp. He eventually started a company to put art galleries online, but it was unsuccessful. He then pivoted to creating software to build online stores, which eventually became successful. He had the idea to run the software on the server and let users control it by clicking on links, which meant users wouldn't need anything more than a browser. This kind of software, known as \"internet storefronts,\" was eventually successful. He and his team worked hard to make the software user-friendly and inexpensive, and eventually the company was bought by Yahoo. After the sale, he left to pursue his dream of painting, and eventually found success in New York. He was able to afford luxuries such as taxis and restaurants, and he experimented with a new kind of still life painting. He also had the idea to create a web app for making web apps, which he eventually pursued and was successful with. He then started Y Combinator, an investment firm that focused on helping startups, with his own money and the help of his friends Robert and Trevor. He wrote essays and books, invited undergrads to apply to the Summer Founders Program, and eventually married Jessica Livingston. After his mother's death, he decided to quit Y Combinator and pursue painting, but eventually ran out of steam and started writing essays and working on Lisp again. He wrote a new Lisp, called Bel, in itself in Arc, and it took him four years to complete. During this time, he worked hard to make the language user-friendly and precise, and he also took time to enjoy life with his family. He encountered various obstacles along the way, such as customs that constrained him even after the restrictions that caused them had disappeared, and he also had to deal with misinterpretations of his essays on forums. In the end, he was successful in creating Bel and was able to pursue his dream of painting."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display_response(list_response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "036077b7-108e-4026-9628-44c694343460",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens\n",
      "> [retrieve] Total LLM token usage: 0 tokens\n",
      "INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 8 tokens\n",
      "> [retrieve] Total embedding token usage: 8 tokens\n",
      "INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 0 tokens\n",
      "> [get_response] Total LLM token usage: 0 tokens\n",
      "INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens\n",
      "> [get_response] Total embedding token usage: 0 tokens\n"
     ]
    }
   ],
   "source": [
    "query_engine = vector_index.as_query_engine()\n",
    "vector_response = query_engine.query(\"What did the author do growing up?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "42229e09",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "**`Final Response:`** None"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display_response(vector_response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ecd7719c-f663-4edb-a239-d2a8f0a5c091",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:llama_index.indices.keyword_table.retrievers:> Starting query: What did the author do after his time at YC?\n",
      "> Starting query: What did the author do after his time at YC?\n",
      "INFO:llama_index.indices.keyword_table.retrievers:query keywords: ['action', 'yc', 'after', 'time', 'author']\n",
      "query keywords: ['action', 'yc', 'after', 'time', 'author']\n",
      "INFO:llama_index.indices.keyword_table.retrievers:> Extracted keywords: ['yc', 'time']\n",
      "> Extracted keywords: ['yc', 'time']\n",
      "INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 10216 tokens\n",
      "> [get_response] Total LLM token usage: 10216 tokens\n",
      "INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens\n",
      "> [get_response] Total embedding token usage: 0 tokens\n"
     ]
    }
   ],
   "source": [
    "query_engine = keyword_table_index.as_query_engine()\n",
    "keyword_response = query_engine.query(\n",
    "    \"What did the author do after his time at YC?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "37524641-2632-4a76-8ae6-00f1285256d9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "**`Final Response:`** After his time at YC, the author decided to pursue painting and writing. He wanted to see how good he could get if he really focused on it, so he started painting the day after he stopped working on YC. He spent most of the rest of 2014 painting and was able to become better than he had been before. He also wrote essays and started working on Lisp again in March 2015. He then spent 4 years working on a new Lisp, called Bel, which he wrote in itself in Arc. He had to ban himself from writing essays during most of this time, and he moved to England in the summer of 2016. He also wrote a book about Lisp hacking, called On Lisp, which was published in 1993. In the fall of 2019, Bel was finally finished. He also experimented with a new kind of still life painting, and tried to build a web app for making web apps, which he named Aspra. He eventually decided to build a subset of this app as an open source project, which was the new Lisp dialect he called Arc."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display_response(keyword_response)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llama-index",
   "language": "python",
   "name": "llama-index"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
